Part 1 of the Pearson Live Training Session “Hands–On Data Visualization with ggplot2” for O’Reilly
{ggplot2} Package
{ggplot2}is a system for declaratively creating graphics,
based on “The Grammar of Graphics” (Wilkinson, 2005). You provide the data, tell{ggplot2}how to map variables to aesthetics, what graphical primitives to use, and it takes care of the details.
{ggplot2}hands-on-ggplot2.Rproj01-grammar.Rmd
ggplot2 is a data visualization package
for the programming language R created by Hadley
Wickham.
It should be already installed on your system (if not run the first
line in the following chunk). The functionality of the package can be
loaded by calling library() as for any other package:
ggplot2 is part of the tidyverse package
collection. Thus, you can also load tidyverse without
running library(ggplot2):
We use cryptocurrency financial data, pulled from CoinMarketCap.com. For our purposes, we limit the data to the period 08/2017–12/2019 and the top 4 cryptocurrencies.
I have already prepared the data. If you want to know how, you can have a look here.
Using the read_csv() function from the
{readr} package, we can read the data directly from the
web:
url <- "https://raw.githubusercontent.com/z3tt/hands-on-ggplot2/main/data/crypto_cleaned.csv"
data <- readr::read_csv(url)
data
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
Of course, one can import local files as well:
data_local <- readr::read_csv("data/crypto_cleaned.csv")
data_local
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
This assumes that you have placed the file in a folder called
data in your working directory.
You can specify this directory via setwd() or, and
preferably, use R projects.
The so–called namespace allows to access functions from a package directly without loading it first.
packagename::function(argument)
Furthermore, it helps readers to understand from which package a function is imported from.
We need to specify the data in the ggplot() call:
ggplot(data = data)
There is only an empty panel because ggplot2 doesn’t
know what of the data it should plot.
We need to specify two variables we want to plot as positional
aesthetics:
There is only an empty panel because ggplot2 doesn’t
know how it should plot the data.
Thanks to implicit matching of arguments in ggplot() and
aes(), we can also write:
By adding one or multiple layers we can tell ggplot2
how to represent the data. There are lots of built-in geometric
elements (geom's) and statistical transformations
(stat's):
We can tell ggplot2 to represent the data for example as
a scatter plot:
ggplot(data, aes(date, close)) +
geom_point()
Aesthetics do not only refer to x and y positions, but also groupings, colors, fills, shapes etc.
ggplot(data = data, mapping = aes(x = date, y = close, color = currency)) +
geom_point()
You can replace the default theme with one of the other built-in
themes with theme_set(). Note that you can as well adjust
some global settings, for example the base_size which is
often too small in the default (11).
theme_set(theme_light(base_size = 18))
By using theme_set() the new theme is used for any plot
you create aftwerwards! Give it a try on go back to the last chunk and
re-run the code to generate the colored scatter plot.
The exciting thing about layers is that you can combine several
geom_*() and stat_*() calls:
ggplot(data, aes(date, close, color = currency)) +
geom_line() +
geom_point()
… and aesthetics can be applied either globally:
ggplot(data, aes(date, close, color = currency, shape = currency)) +
geom_line() +
geom_point()
… or for each layer individually:
ggplot(data, aes(date, close)) +
geom_line(aes(color = currency)) +
geom_point(aes(shape = currency))
chic <- readr::read_csv(
"https://raw.githubusercontent.com/z3tt/ggplot-courses/master/data/chicago-nmmaps.csv"
)
temp) versus day
(date).season).year).
ggplotYou can export your plot via the ggsave() function:
-> Scales, Coordinate Systems, Facets, Themes, and Annotations will follow later
“ggplot2: Elegant Graphics for Data Analysis”, free–access book by Hadley Wickham et al.
“R for Data Science”, free–access book by Hadley Wickham
“Data Visualization: A Practical Introduction”, free–access book by Kieran Healy
“A
{ggplot2} Tutorial for Beautiful Plotting in R”, my
extensive “how to”-tutorial
{here} PackageA good workflow when working with local files is offered by the
{here} package in combination with R projects:
here::here()
[1] "C:/Users/DataVizard/Google Drive/Work/DataViz/Teaching/2022_OReilly_Trainings/hands-on-ggplot2-training"
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
The base R function read.csv() works in the same way as
readr::read_csv():
currency date open high low close year month yday
1 binance-coin 2019-12-04 15.35 15.69 15.01 15.28 2019 12 338
2 binance-coin 2019-12-03 15.19 15.55 15.05 15.31 2019 12 337
3 binance-coin 2019-12-02 15.51 15.71 15.15 15.19 2019 12 336
4 binance-coin 2019-12-01 15.74 15.74 15.05 15.50 2019 12 335
5 binance-coin 2019-11-30 16.26 16.37 15.54 15.72 2019 11 334
6 binance-coin 2019-11-29 15.68 16.34 15.65 16.27 2019 11 333
… and we can turn it into a tibble afterwards:
data <- tibble::as_tibble(data)
data
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
However, note that by default the date column is runed into type
character.
The import() function from the {rio}
package allows to load all kind of data formats:
#install.packages("rio")
data <- rio::import(here::here("data", "crypto_cleaned.csv"))
head(data) ## use just head because the output is very long
currency date open high low close year month yday
1 binance-coin 2019-12-04 15.35 15.69 15.01 15.28 2019 12 338
2 binance-coin 2019-12-03 15.19 15.55 15.05 15.31 2019 12 337
3 binance-coin 2019-12-02 15.51 15.71 15.15 15.19 2019 12 336
4 binance-coin 2019-12-01 15.74 15.74 15.05 15.50 2019 12 335
5 binance-coin 2019-11-30 16.26 16.37 15.54 15.72 2019 11 334
6 binance-coin 2019-11-29 15.68 16.34 15.65 16.27 2019 11 333
We can turn it into a tibble afterwards—or specify it directly when importing the data set:
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <date> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 binance-coin 2019-12-04 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 2019-12-03 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 2019-12-02 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 2019-12-01 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 2019-11-30 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 2019-11-29 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 2019-11-28 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 2019-11-27 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 2019-11-26 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 2019-11-25 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
You could also load, for example, JSON or Excel files with the same function:
data_json <- rio::import(here::here("data", "crypto_cleaned.json"))
data_json <- as_tibble(data_json) ## somehow `setclass` doesn't work with json
data_json
# A tibble: 2,812 x 9
currency date open high low close year month yday
<chr> <int> <dbl> <dbl> <dbl> <dbl> <int> <int> <int>
1 binance-coin 18234 15.4 15.7 15.0 15.3 2019 12 338
2 binance-coin 18233 15.2 15.6 15.0 15.3 2019 12 337
3 binance-coin 18232 15.5 15.7 15.2 15.2 2019 12 336
4 binance-coin 18231 15.7 15.7 15.0 15.5 2019 12 335
5 binance-coin 18230 16.3 16.4 15.5 15.7 2019 11 334
6 binance-coin 18229 15.7 16.3 15.6 16.3 2019 11 333
7 binance-coin 18228 16.1 16.2 15.6 15.7 2019 11 332
8 binance-coin 18227 15.5 16.2 14.9 16.1 2019 11 331
9 binance-coin 18226 15.3 15.9 15.2 15.5 2019 11 330
10 binance-coin 18225 15.3 15.7 14.2 15.3 2019 11 329
# ... with 2,802 more rows
# A tibble: 2,812 x 10
...1 currency date open high low close year
<chr> <chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 binance-co~ 2019-12-04 00:00:00 15.4 15.7 15.0 15.3 2019
2 2 binance-co~ 2019-12-03 00:00:00 15.2 15.6 15.0 15.3 2019
3 3 binance-co~ 2019-12-02 00:00:00 15.5 15.7 15.2 15.2 2019
4 4 binance-co~ 2019-12-01 00:00:00 15.7 15.7 15.0 15.5 2019
5 5 binance-co~ 2019-11-30 00:00:00 16.3 16.4 15.5 15.7 2019
6 6 binance-co~ 2019-11-29 00:00:00 15.7 16.3 15.6 16.3 2019
7 7 binance-co~ 2019-11-28 00:00:00 16.1 16.2 15.6 15.7 2019
8 8 binance-co~ 2019-11-27 00:00:00 15.5 16.2 14.9 16.1 2019
9 9 binance-co~ 2019-11-26 00:00:00 15.3 15.9 15.2 15.5 2019
10 10 binance-co~ 2019-11-25 00:00:00 15.3 15.7 14.2 15.3 2019
# ... with 2,802 more rows, and 2 more variables: month <dbl>,
# yday <dbl>
We can remove the first column by using the select()
function from the {dplyr} package:
data_xlsx <- dplyr::select(data_xlsx, -1)
#data_xlsx <- dplyr::select(data_xlsx, currency:yday)
data_xlsx
# A tibble: 2,812 x 9
currency date open high low close year month
<chr> <dttm> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 binance-co~ 2019-12-04 00:00:00 15.4 15.7 15.0 15.3 2019 12
2 binance-co~ 2019-12-03 00:00:00 15.2 15.6 15.0 15.3 2019 12
3 binance-co~ 2019-12-02 00:00:00 15.5 15.7 15.2 15.2 2019 12
4 binance-co~ 2019-12-01 00:00:00 15.7 15.7 15.0 15.5 2019 12
5 binance-co~ 2019-11-30 00:00:00 16.3 16.4 15.5 15.7 2019 11
6 binance-co~ 2019-11-29 00:00:00 15.7 16.3 15.6 16.3 2019 11
7 binance-co~ 2019-11-28 00:00:00 16.1 16.2 15.6 15.7 2019 11
8 binance-co~ 2019-11-27 00:00:00 15.5 16.2 14.9 16.1 2019 11
9 binance-co~ 2019-11-26 00:00:00 15.3 15.9 15.2 15.5 2019 11
10 binance-co~ 2019-11-25 00:00:00 15.3 15.7 14.2 15.3 2019 11
# ... with 2,802 more rows, and 1 more variable: yday <dbl>
Some prefer to place the aes() outside the
ggplot() call:
The coordinate system maps the two position to a 2d position on the plot:
ggplot(data, aes(x = date, y = close,
color = currency)) +
geom_line() +
geom_point() +
scale_x_date() +
scale_y_continuous() +
scale_color_discrete() +
coord_cartesian()
ggplot(data, aes(x = date, y = close,
color = currency)) +
geom_line() +
geom_point() +
scale_x_date() +
scale_y_continuous() +
scale_color_discrete() +
coord_polar()
Changing the limits on the coordinate system allows to zoom in:
ggplot(data, aes(x = date, y = close,
color = currency)) +
geom_line() +
geom_point() +
scale_x_date() +
scale_y_continuous() +
scale_color_discrete() +
coord_cartesian(
xlim = c(as.Date("2018-11-01"),
as.Date("2019-11-01")),
ylim = c(NA, 100)
)
[1] "2022-06-25 13:48:26 CEST"
Local: main C:/Users/DataVizard/Google Drive/Work/DataViz/Teaching/2022_OReilly_Trainings/hands-on-ggplot2-training
Remote: main @ origin (https://github.com/z3tt/hands-on-ggplot2-training.git)
Head: [1cc6183] 2022-02-18: fix typo
R version 4.1.0 (2021-05-18)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252
[3] LC_MONETARY=German_Germany.1252 LC_NUMERIC=C
[5] LC_TIME=German_Germany.1252
system code page: 65001
attached base packages:
[1] stats graphics grDevices utils datasets methods
[7] base
other attached packages:
[1] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.7 purrr_0.3.4
[5] readr_2.0.2 tidyr_1.1.4 tibble_3.1.6 ggplot2_3.3.5
[9] tidyverse_1.3.1
loaded via a namespace (and not attached):
[1] httr_1.4.2 sass_0.4.0 bit64_4.0.5
[4] vroom_1.5.5 jsonlite_1.7.2 here_1.0.1
[7] modelr_0.1.8 bslib_0.3.1 assertthat_0.2.1
[10] highr_0.9 cellranger_1.1.0 yaml_2.2.1
[13] pillar_1.6.4 backports_1.2.1 glue_1.4.2
[16] digest_0.6.29 rvest_1.0.2 colorspace_2.0-2
[19] htmltools_0.5.2 pkgconfig_2.0.3 broom_0.8.0
[22] haven_2.4.3 scales_1.1.1 openxlsx_4.2.5
[25] rio_0.5.29 distill_1.3 tzdb_0.1.2
[28] downlit_0.4.0 git2r_0.29.0 generics_0.1.1
[31] farver_2.1.0 ellipsis_0.3.2 cachem_1.0.6
[34] withr_2.4.3 cli_3.1.0 magrittr_2.0.1
[37] crayon_1.4.2 readxl_1.3.1 memoise_2.0.1
[40] evaluate_0.15 fs_1.5.0 fansi_0.5.0
[43] foreign_0.8-81 xml2_1.3.2 textshaping_0.3.6
[46] data.table_1.14.2 tools_4.1.0 hms_1.1.1
[49] lifecycle_1.0.1 munsell_0.5.0 reprex_2.0.1
[52] zip_2.2.0 compiler_4.1.0 jquerylib_0.1.4
[55] systemfonts_1.0.3 rlang_1.0.2 grid_4.1.0
[58] rstudioapi_0.13 labeling_0.4.2 rmarkdown_2.11
[61] gtable_0.3.0 DBI_1.1.2 curl_4.3.2
[64] R6_2.5.1 lubridate_1.8.0 knitr_1.39
[67] fastmap_1.1.0 bit_4.0.4 utf8_1.2.2
[70] rprojroot_2.0.2 ragg_1.1.3 stringi_1.7.5
[73] parallel_4.1.0 Rcpp_1.0.7 vctrs_0.3.8
[76] dbplyr_2.1.1 tidyselect_1.1.1 xfun_0.31